Information-theoretic method for clustering and merging imprecise data

ABSTRACT

An information-theoretic method clusters and merges bi-variate normal data or ‘error ellipses’ lying in a plane. Two or more error ellipses are clustered and then merged into a single error ellipse if the information lost in the merging process is sufficiently small. This criterion is numerically implemented and tested in a code developed for this purpose.

CROSS REFERENCE TO RELATED APPLICATIONS

The present invention claims priority to U.S. Provisional ApplicationNo. 60/610,693 entitled “Information Theoretic Method For Clustering andMerging Imprecise Data” filed on Sep. 17, 2004 and listing as inventorsWilliam Peter and Donald Lemons.

STATEMENT OF GOVERNMENTAL INTEREST

This invention was made with United States Government support underContract No. MDA 904-01-F-0371 with the Maryland Procurement Office, andthe United States Government has certain rights in this invention.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to methods for clustering and mergingimprecise data and, more particularly, to methods for clustering andmerging of imprecise data in the form of bivariate normal randomvariables.

2. Brief Description of Prior Developments

Consider the problem of tracking, with an imprecise measuringinstrument, the individual positions of a group of fish as they swimaround in an effective two-dimensional plane, say, near the surface of alake. Furthermore, supposing that reflections or rapid swimming duringthe measurement process cause multiple images of a single fish to berecorded. Because the data is imprecise, each position is represented bya pair of random position variables. Therefore, the problem ofeliminating redundant positions referring to a single fish requiressolving the equivalent analytical problem of first clustering and thenmerging multiple bi-variate random variables.

SUMMARY OF INVENTION

Assuming that these bi-variates are normal, the random positionassociated with each piece of data is completely determined by twomeans, two variances, and a covariance. Alternatively, each bi-variatenormal is represented by a particular “error ellipse”, defined as acontour of constant probability density, with given position, size, andorientation.

The present invention is an information-theoretic solution to theproblem of clustering and merging error ellipses. In particular, ellipseare used only when such merging loses sufficiently little quantifiableinformation given that the replacement ellipse preserves the pooledmeans, variances, and covariances of the original ellipses. Limitinginformation loss provides a true similarity criterion since onlyellipses that are similar in position, size, and orientation are merged.This criterion is robust and reduces to a single Boolean test that canbe applied either pair-wise sequentially or all at once to a group ofellipses. It is that using this criterion to merge and thus to eliminateredundancies in or to reduce and simplify bi-variate normal data, but itcan also be used to cluster error ellipses without subsequent merging.

Cluster analysis has a long history and a large literature ofapplication in the social and natural sciences as disclosed inAldenderfer, M. S. and Roger K. Blashfield, Cluster Analysis (Sage,Newbury Park, 1984); Everitt, Brian, Cluster Analysis (Heinemann,London, 1974); and Kaufman, Leonard and Peter J. Rousseeuw, FindingGroups in Data (Wiley, New York, 1990), the contents all of which areincorporated herein by reference J. H. Ward's entropy criteria forhierarchically grouping data, which is disclosed in Ward, J. H.,“Hierarchical grouping to optimize an objective function,” J. Am.Statist. Ass., 58, 236-244(1963), the contents of which is alsoincorporated herein by reference, is also relevant to the presentinvention. However, there are several important structural differencesthat distinguish typical cluster analysis techniques and theinformation-theoretic method of the present invention. (1) Clusteranalysis groups data conceived as realizations of random variables,whereas in the present invention bi-variate random variables as grouped.(2) Cluster analysis stops at forming groups of data, whereas errorellipses are grouped in order to merge them into a single replacementellipse. (3) And cluster analysis decides whether or not to group databy inspecting the group's statistical properties whereas in the presentinvention decisions are made by comparing the means, variances, andcovariances of the ellipses to be clustered with those of theirpotential replacement ellipse.

BRIEF DESCRIPTION OF THE FIGURES

The above described features and advantages of the present inventionwill be more fully appreciated with reference to the detaileddescription and appended figures, in which:

FIG. 1 depicts a data fusion of imprecise geospatial data in the form oferror ellipses, according to an embodiment of the present invention.

FIG. 2 depicts a data fusion of imprecise geospatial data in the form oferror ellipses where one error ellipse has a different orientation, sizeand shape according to an embodiment of the present invention.

DETAILED DESCRIPTION

I. How to Merge Two Error Ellipses

Because each error ellipse in the x-y plane represents a bi-variatenormal random variable, each is uniquely described by two means, μ_(x)and μ_(y), two variances, σ_(x) ² and σ_(y) ², and one covarianceσ_(xy). The probability density function of such a bi-variate ellipse is$\begin{matrix}{{p\left( {x,y} \right)} = \frac{\begin{matrix}{\exp\left\lfloor {\frac{- 1}{2\left( {1 - \rho^{2}} \right)}\left\{ {\left( \frac{x - \mu_{x}}{\sigma_{x}} \right)^{2} -} \right.} \right.} \\\left. \left. {{2\rho\left( \frac{x - \mu_{x}}{\sigma_{x}} \right)\left( \frac{y - \mu_{y}}{\sigma_{y}} \right)} + \left( \frac{y - \mu_{y}}{\sigma_{y}} \right)^{2}} \right\} \right\rfloor\end{matrix}}{2{\pi\sigma}_{x}\sigma_{y}\sqrt{1 - \rho^{2}}}} & (1)\end{matrix}$where ρ(=σ_(xy)/σ_(x)σ_(y)) is the correlation coefficient,σ_(xy)=(XY)−(X×Y), σ_(x) ²=(X²)−(X)², σ_(y) ²=(Y²(Y)−(Y)², μ_(x)=(X),and μ_(y)=(Y). In these definitions capital letters denote randomvariables and a bracket denotes the mean operator as in $\begin{matrix}{\left\langle {f\left( {X,Y} \right)} \right\rangle = {\int_{- \infty}^{+ \infty}{{f\left( {x,y} \right)}{p\left( {x,y} \right)}\quad{\mathbb{d}x}{{\mathbb{d}y}.}}}} & (2)\end{matrix}$

In the following, the subscripts “1”, “2”, and “m” are used to denotequantities associated with the two statistically independent bi-variatenormals “1” and “2” and the merged bi-variate normal “m.” Thus, theposition coordinates (X₁, Y₁) are described by p₁(x,y), given by (1)with μ_(1x), μ_(y), σ_(1x) ², σ_(1y) ², and σ_(1xy) (or ρ₁) and (X₂, Y₂)by p₂(x,y), likewise given by (1), with μ_(2x), μ_(2y), σ_(2x) ², σ_(2y)², and σ_(2xy) (or ρ₂). We assume that random variables X₁ and Y₁ arestatistically independent of variables X₂ and Y₂ so that, e.g.,(X₁X₂)=(X₁×X₂)=μ_(1x)μ_(2x).

Points “1” and “2” are merged into the single point “m” described by theprobability density P_(m) (x,y) given by (1) with parameters μ_(mx),μ_(my), σ_(mx) ², σ_(my) ², and σ_(mxy). These parameters are, bydefinition, identical to the means, variances, and covariance of thepooled probability density [p₁(x,y)+p₂(x,y)]/2, i.e., p_(m)(x,y)=[p₁(x,y)+p₂ (x,y)]/2 through second order moments. Thus, μ_(mx) isdefined by $\begin{matrix}{{\mu_{mx} = {\left\langle \frac{\left( {X_{1} + X_{2}} \right)}{2} \right\rangle = \frac{\mu_{1x} + \mu_{2x}}{2}}}{{and},{also}}} & (3) \\{\mu_{my} = {\frac{\mu_{1y} + \mu_{2y}}{2}.}} & (4)\end{matrix}$In similar fashion $\begin{matrix}\begin{matrix}{\sigma_{mx}^{2} = {\left\langle {\frac{1}{2}\left( {X_{1} + X_{2}} \right)^{2}} \right\rangle - \left( \frac{\mu_{1x} + \mu_{2x}}{2} \right)^{2}}} \\{{= {\frac{\left( {\sigma_{1x}^{2} + \sigma_{2x}^{2}} \right)}{2} + \frac{\left( {\mu_{1x} - \mu_{2x}} \right)^{2}}{4}}},}\end{matrix} & (5) \\{{\sigma_{my}^{2} = {\frac{\left( {\sigma_{1y}^{2} + \sigma_{2y}^{2}} \right)}{2} + \frac{\left( {\mu_{1y} - \mu_{2y}} \right)^{2}}{4}}},{and}} & (6) \\{\sigma_{mxy} = {\frac{\left( {\sigma_{1{xy}} + \sigma_{2{xy}}} \right)}{2} + {\frac{\left( {\mu_{1x} - \mu_{2x}} \right)\left( {\mu_{1y} - \mu_{2y}} \right)}{4}.}}} & (7)\end{matrix}$As one might expect, the separations of the means, (μ_(1x)−μ_(2x)) and(μ_(1y)−μ_(2y)), of the variables “1” and “2” contribute to thevariances and covariance of the merged variable. FIG. 1 illustrates theresult of merging two error ellipses.II. Whether to Merge Two Error Ellipses

In deciding whether to merge two error ellipses we compare the entropyor, equivalently, the “missing information” of the two error ellipseswith the entropy of the single ellipse into which the two would bemerged. Indeed, it is only comparative entropies that can be coherentlydefined for continuous random variables. The so-called “self-entropy” ofa continuous bi-normal random variable with probability density p(x,y),that is, <(1n[p(x,y)]> or −∫_(−∞)^(+∞)p(x, y)ln [p(x, y)]  𝕕x𝕕yhas been severely criticized not only as capable of assuming negativevalues but also as being dimensionally indeterminate as is disclosed inSmith, J. D. H. “Some Observations on the Concepts ofInformation-Theoretic Entropy and Randomness,” Entropy, 3, 1-11(2001),the contents of which are incorporated herein by reference. Neither ofthese two properties has a natural information-theoretic interpretation.

For this reason another quantity the relative entropy of one ellipsewith respect to another is exploited as disclosed in Jumarie, G.,Relative Information, (Springer-Verlag, Berlin, 1990) p. 36, thecontents of which are incorporated herein by refrence. In particular,the relative entropy of ellipse “1” with respect to a reference ellipse“0” is defined by $\begin{matrix}{{S\left( {1,0} \right)} = {- {\int{{p_{1}\left( {x,y} \right)}\ln\left\lfloor \frac{p_{1}\left( {x,y} \right)}{p_{0}\left( {x,y} \right)} \right\rfloor{\mathbb{d}x}{\mathbb{d}y}}}}} & (8)\end{matrix}$where, of course, the primary and reference probability densitiesp₁(x,y) and p₀(x,y) must be normalized and positive definite on the samedomain. Since regions of the domain for which p (x,y)>p₀(x,y) contributenegatively to the relative entropy and regions for which p₁(x,y)<p₀(x,y) contribute positively, negative values of S(1,0) areassociated with a relatively small and positive values with a relativelylarge error ellipse “1” compared to the area of the reference ellipse“0.” In short, the reference ellipse establishes a metric with which theentropy of a random variable is quantified.

The relative entropy generated when ellipse “1” and “2” are merged intoa single ellipse “m” is given by the difference,ΔS=S(m,0)−S(1,0)−S(2,0),   (9)between the relative entropy of the merged ellipse and the sum of therelative entropies of the two original ellipses. ΔS≧0 for any referenceellipse “0” and also that ΔS is minimum when the merged ellipse “m” isidentified with the reference ellipse “0” or vice-versa. Thisoptimization is adapted, i.e., “0”=“m”. Since the entropy of an ellipserelative to itself necessarily vanishes, e.g., S(m, m)≡0, equation (9)becomesΔS=−S(1,m)−S(2,m)   (10)Equation (10) is, in turn, equivalent toΔS=∫p ₁(x,y)ln[p ₁(x,y)]dxdy+∫p ₂(x,y)ln[p ₂(x,y)]dxdy−∫p ₁(x,y)+p₂(x,y)]ln[p _(m)(x,y)]dxdy.   (11)However, since the moments, through second order, of P_(m) (x,y) are, bydefinition, identical to the moments, through second order, of[p₁(x,y)+p₂(x,y)]2, (11) can also be written asΔS=−2∫p _(m)(x,y)ln[p _(m)(x,y)]dxdy+∫p ₁(x,y)ln[p ₁(x,y)]dxdy+∫p₂(x,y)ln[p ₂(x,y)]dxdy   (12)Thus, the entropy generated ΔS by merging two ellipses is the differencebetween the self-entropy associated with two, independent, mergedellipses minus the self-entropy associated with the two originalellipses. Given that each of the probability densities, p₁(x,y), p₂(x,y), and p_(m)(x,y), is that of a correlated bi-variate normal, (12)becomes $\begin{matrix}{{\Delta\quad S} = {\ln{\left\lfloor \frac{\sigma_{mx}^{2}{\sigma_{my}^{2}\left( {1 - \rho_{m}^{2}} \right)}}{\sigma_{1x}\sigma_{1y}\sqrt{1 - \rho_{1}^{2}}\sigma_{2x}\sigma_{2y}\sqrt{1 - \rho_{2}^{2}}} \right\rfloor.}}} & (13)\end{matrix}$The argument of the natural logarithm in (13) is the ratio of the squareof the area of the merged ellipse A_(m) ²=[πσ_(mx)σ_(my)√{square rootover (1−ρ_(m) ²)}]⁻² to the product of the areas of the unmergedellipses A₁=πσ_(1x)σ_(1y) √{square root over (1−ρ ¹ ² )} and A₂=πσ_(2x)σ_(2y) √{square root over (1−ρ ² ² )}. The criteria thatellipses are merged only when the entropy generated, ΔS, is less than orequal to a critical amount ΔS _(crit), i.e., only when ΔS≦ΔS_(crit),reduces to a condition on the ratio A_(m) ²/(A₁A₂), i.e.,$\begin{matrix}{\frac{\sigma_{mx}^{2}{\sigma_{my}^{2}\left( {1 - \rho_{m}^{2}} \right)}}{\sigma_{1x}\sigma_{1y}\sqrt{1 - \rho_{1}^{2}}\sigma_{2x}\sigma_{2y}\sqrt{1 - \rho_{2}^{2}}} \leq R_{crit}} & (14)\end{matrix}$where R_(crit)=exp{ΔS_(crit)}. The actual numerical value of thecritical ratio of areas R_(crit) regulates the degree of similarityrequired for merging. Thus, the smaller R_(crit) (recall thatR_(crit)≧1), the more similarity is required for merging; the largerR_(crit) the less similarity is required.III. Merging More Than Two Error Ellipses

Suppose we wish to test whether or not a group of n ellipsesrepresenting n statistically independent bi-variate normal variablesshould be clustered and merged. The generalization from two to manyellipses is quite straightforward. First, we determine the parametersdescribing the merged ellipse in terms of the parameters describing then unmerged ellipses via the data-pooling algorithm. Thus,$\begin{matrix}{{\mu_{mx} = {\left\langle {\frac{1}{n}{\sum\limits_{i = 1}^{n}X_{i}}} \right\rangle = {\frac{1}{n}{\sum\limits_{i = 1}^{n}\mu_{ix}}}}}{and}} & (15) \\{\mu_{my} = {\frac{1}{n}{\sum\limits_{i = 1}^{n}{\mu_{iy}.}}}} & (16)\end{matrix}$Furthermore, $\begin{matrix}{\sigma_{mx}^{2} = \left\langle {\frac{1}{n}{\sum\limits_{i = 1}^{n}\left( {X_{i} - \mu_{x}} \right)^{2}}} \right\rangle} & (17) \\{\quad{{= {{\frac{1}{n}{\sum\limits_{i = 1}^{n}\sigma_{ix}^{2}}} + {\frac{1}{n}{\sum\limits_{i = 1}^{n}\left( {\mu_{ix} - \mu_{x}} \right)^{2}}}}},}} & (18) \\{\begin{matrix}{\sigma_{my}^{2} = \left\langle {\frac{1}{n}{\sum\limits_{i = 1}^{n}\left( {Y_{i} - \mu_{y}} \right)^{2}}} \right\rangle} \\{{= \quad{{\frac{1}{n}{\sum\limits_{i = 1}^{n}\sigma_{iy}^{2}}} + {\frac{1}{n}{\sum\limits_{i = 1}^{n}\left( {\mu_{iy} - \mu_{y}} \right)^{2}}}}},}\end{matrix}{and}} & (19) \\{{\sigma_{mxy} = {{\frac{1}{n}{\sum\limits_{i = 1}^{n}\sigma_{ixy}}} + {\frac{1}{n}{\sum\limits_{i = 1}^{n}{\left( {\mu_{ix} - \mu_{x}} \right){\left( {\mu_{iy} - \mu_{y}} \right).}}}}}}\quad} & (20)\end{matrix}$The criteria for merging then becomes $\begin{matrix}{{{S\left( {m,0} \right)} - {\sum\limits_{i = 1}^{n}{S\left( {i,0} \right)}}} \leq {S_{crit}.}} & (21)\end{matrix}$On minimizing the entropy generated by merging, i.e., on identifying themerged ellipse with the reference ellipse, i.e., “m”=“0”, and (21)becomes $\begin{matrix}{{\sum\limits_{i = 1}^{n}{\int{{p_{i}\left( {x,y} \right)}\ln\left\lfloor \frac{p_{i}\left( {x,y} \right)}{p_{m}\left( {x,y} \right)} \right\rfloor{\mathbb{d}x}{\mathbb{d}y}}}} \leq S_{crit}} & (22)\end{matrix}$or, since, through second order moments, $\begin{matrix}{{p_{m} = {\left( {1/n} \right){\sum\limits_{i = 1}^{n}p_{i}}}},{{{\int{\sum\limits_{i = 1}^{n}{{p_{i}\left( {x,y} \right)}{\ln\left\lbrack {p_{i}\left( {x,y} \right)} \right\rbrack}{\mathbb{d}x}{\mathbb{d}y}}}} - {n{\int{{p_{m}\left( {x,y} \right)}{\ln\left\lbrack {p_{m}\left( {x,y} \right)} \right\rbrack}{\mathbb{d}x}{\mathbb{d}y}}}}} \leq {S_{crit}.}}} & (23)\end{matrix}$Given that the p₁ (x,y) and p_(m) (x,y) represent bi-variate normals,(23) becomes $\begin{matrix}{{\ln\left\lfloor \frac{\sigma_{mx}^{n/2}{\sigma_{my}^{n/2}\left( {1 - \rho_{m}^{2}} \right)}^{n/2}}{\prod\limits_{i = 1}^{n}{\sigma_{ix}\sigma_{iy}\sqrt{\left( {1 - \rho_{i}^{2}} \right)}}} \right\rfloor} \leq S_{crit}} & (24)\end{matrix}$which is equivalent to $\begin{matrix}{\frac{\sigma_{mx}^{n/2}{\sigma_{my}^{n/2}\left( {1 - \rho_{m}^{2}} \right)}^{n/2}}{\prod\limits_{i = 1}^{n}{\sigma_{ix}\sigma_{iy}\sqrt{\left( {1 - \rho_{i}^{2}} \right)}}} \leq {R_{crit}.}} & (25)\end{matrix}$The later is the generalization of criterion (14) to the case of mergingn ellipses.

FIG. 1 depicts an example showing five error ellipses representing fivelocation measurements. From these measurements, the most probablelocation of a target is determined by constructing a merged errorellipse Nm as discussed above. The loss of information reflected in themerging of the measurements into one error ellipse is determined to be:ΔS=5.14. If the loss of information is small relative to a knownthreshold, the target would have a 95% probability of being within Nmand the most probable position of the target would be at its center.

FIG. 2 depicts an example showing six error ellipses (five of them fromthe dataset of FIG. 1). The sixth ellipse has a different orientation,center and shape than the other ellipses. Also shown is the mergedellipse of the dataset of FIG. 2, again denoted by Nm The entropyincrease in this case is calculated to be ΔS=9.62, suggesting that theloss of information by merging this sixth, somewhat different, ellipsein the dataset may be significant. In some applications, the relativelylarge size of Nm and large value of ΔS may be above a given empiricalthreshold, resulting in a measurement being discarded or perhaps beingidentified with another target.

While the present invention has been described in connection with thepreferred embodiments of the various figures, it is to be understoodthat other similar embodiments may be used or modifications andadditions may be made to the described embodiment for performing thesame function of the present invention without deviating therefrom. Forexample, the invention may be implemented by a general purpose computerhaving a processor, memory and computer program instructions storedtherein. The equations, for example, may be implemented in a computerprogram product executed by the computer which causes the computer toprocess information received from, for example a network, a database orother memory, and/or sensors. The present invention should not belimited to any single embodiment, but rather construed in breadth andscope in accordance with the recitation of the appended claims.

1. An information theoretic method of clustering and merging bi-variatenormal data comprising the steps of: clustering the data into two ormore error ellipses; and merging the data into a single error ellipse ifthe information lost in the merging process is sufficiently small. 2.The method according to claim 1, further comprising: determining achange in entropy associated with merging the ellipses.
 3. The methodaccording to claim 2, wherein the change in entropy is compared with apredetermined threshold to determine whether information lost in themerging process is sufficiently small.
 4. The method according to claim3, further comprising: determining the most probable location of atarget based on the merging.
 5. The method according to claim 1, whereinthe data is one of global positioning system data; geographicinformation system data, bioinformatics data and pattern recognitiondata.
 6. A computer program product having computer program logic storedtherein, wherein the computer program logic comprises: clustering logicfor causing a computer to cluster data that includes associated errorinto two or more error ellipses; and merging logic for causing thecomputer to merge the data into a single error ellipse if theinformation lost in the merging process is sufficiently small.
 7. Thecomputer program product according to claim 6, further comprising:determining logic for causing the computer to determine a change inentropy associated with merging the ellipses.
 8. The computer programproduct according to claim 7, wherein the change in entropy is comparedwith a predetermined threshold to determine whether information lost inthe merging process is sufficiently small.
 9. The computer programproduct according to claim 8, further comprising: determining logic forcausing the computer to determine the most probable location of a targetbased on the merging.
 10. The computer program product according toclaim 6, wherein the data is one of global positioning system data;geographic information system data, bioinformatics data and patternrecognition data.